Learning Objectives

After completing this lesson, you’ll be able to:

Define schema.
Explain the basics of how the SchemaScanner transformer works in dynamic workflows.
Understand the scenarios where the SchemaScanner is useful.

Introduction

With the SchemaScanner, you can easily extract and manipulate the schema of your datasets, tackling dynamic workspace issues such as schema standardization and schema drift. How does it work? The SchemaScanner gives you a list attribute with attribute names and data types. Downstream in your workspace, you can use this list attribute to manipulate your schema and create flexible workflows. Instead of defining a fixed schema on your writers, you can use the schema from the list attribute to flexibly define the schema at runtime. Quality assurance and schema drift handling just got easier!

What is a Schema?

A schema, sometimes referred to as the "data model," can be described as the structure of a dataset or, more accurately, a formal definition of a dataset’s structure.

Each dataset has its unique schema, which includes feature types, permitted geometries, user-defined attributes, and other rules that define or restrict its content. However, for most users, the most important aspect of schema is attribute names and data types.

Using the SchemaScanner

The SchemaScanner processes features and retrieves their schema by scanning for the attribute name and its data type. It will either scan all features or just a specified number of them. There’s also the option to exclude attributes using a Regular Expression to ensure a clean schema output.

The resulting output is a new schema feature output via the <Schema> output port. This new feature is also given the special attribute and value: fme_schema_handling = ‘schema_only’, which allows the feature to be recognized by a dynamic writer as a schema feature. If you wish to continue using the original input features, these are passed via the Output port.

For more technical information on the SchemaScanner, check out the documentation.

Why you might want to use the SchemaScanner in your workspace:

You want to ensure your dynamic writer is receiving a valid schema
You don’t know the schema of incoming data and want to make sure it meets certain standards before being used in a dynamic writer
You want to modify the schema before it reaches the writer
You want to expose the schema for validation purposes (checking for schema drift)

Key things to remember:

When used in a dynamic workspace, schema features should be output from the SchemaScanner first using the Output Schema Features Before Data parameter.
Most schemas contain attributes you don’t need in your final output dataset, so be sure to use the Ignore Attributes Containing parameter.